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The dynamics of short-lived mRNA results in bursts of protein production in gene regulatory 
networks. We investigate the propagation of bursting noise between different levels of mathematical 
modelling, and demonstrate that conventional approaches based on diffusion approximations can fail 
to capture bursting noise. An alternative coarse-grained model, the so-called piecewise deterministic 
Markov process (PDMP), is seen to outperform the diffusion approximation in biologically relevant 
parameter regimes. We provide a systematic embedding of the PDMP model into the landscape of 
existing approaches, and we present analytical methods to calculate its stationary distribution and 
switching frequencies. 


Transcription and translation in the process of gene 
expression occur at the molecular level and in environ¬ 
ments of relatively small copy numbers. The discreteness 
of the molecular dynamics and the inherent randomness 
with which reactions occur are known as ‘intrinsic noise’. 
It is now widely accepted that intrinsic noise plays an im¬ 
portant role in gene regulatory networks 1-3 . It promotes 
epigenetic diversity and enhances the adaptability of a 
single phenotype in changing environments 4,5 . To inves¬ 
tigate the effects of intrinsic noise, mathematical models 
at different levels have been constructed, ranging from 
microscopic models 3,6 10 describing the finer origins of 
intrinsic noise to mesoscopic models 11-15 . While the for¬ 
mer capture the biological processes in more detail, the 
latter are computationally scalable and constructed to 
model more complex networks. These models all capture 
some signatures of intrinsic noise, but the detailed imple¬ 
mentation of stochasticity varies from model to model. 
It is then important to consider how noise propagates 
between different levels of mathematical modelling. At 
present coarse-grained models are often proposed ad hoc 
and not derived from the more detailed lower-scale mod¬ 
els. Is this always mathematically appropriate? What 
statistics of noise should modellers use at the different 
levels of coarse graining? What are the consequences of 
the choice of noise statistics, and what are the pitfalls in 
deriving models on the meso-level from finer models on 
smaller scales? These are some of the questions we aim 
to address in this work. 

The above difficulties in transitioning between dif¬ 
ferent levels of modelling can nicely be illustrated in 
the context of biological switches. These are systems 
with different metastable states and the possibility to 
‘switch’ between those states. Biological organisms with 
such behaviour include the Lac switch 6 in Escherichia 
coli and the Enterobacteria phage A switch 7 . Compu¬ 
tational and mathematical models of these range from 
very detailed descriptions 6,7 over individual-molecule 
approaches 8-10,16 to mesoscopic models 11,12,14,15 . 

The difficulties in connecting these different levels of 


modelling biological switches are amplified by the recent 
recognition that the mRNA populations are essential to 
describing the statistics of regulatory processes 16,17 . Bi¬ 
ologically, mRNA molecules are a relatively short-lived 
source compared to the proteins into which they ul¬ 
timately translate. Protein production from a given 
mRNA molecule proceeds while it exists, but ceases after 
the mRNA decays. This leads to a production of protein 
in bursts—that is, the production is active for a relatively 
short and random period of mRNA lifetime, and dur¬ 
ing that time a random number of proteins is generated. 
This phenomenon is termed translational bursting 1 and it 
can be observed in single-molecule experiments 18 . While 
some mesoscopic models account for such bursting 14,15 , 
the theoretical investigation of these processes is often 
limited to their stationary distribution and frequently 
does not include dynamic features such as switching 
times. 

The aim of our work is to investigate the effects burst¬ 
ing noise in gene regulatory networks 9-11,13,16,19-21 , and 
to construct connections between individual-based mod¬ 
els and mesoscopic approaches. Specifically, we start 
from microscopic and individual-molecule-based models 
of a toggle switch and set out to construct coarse grained, 
mathematically tractable models without systematically 
biasing the outcomes. 

RESULTS 

Different scales of individual-based models of a 
toggle switch network. We compare four individual- 
based models and investigate the effect of bursting noise 
in a toggle switch network. The first model we consider 
describes both the mRNA and the protein population 
dynamics 16 . Fig. la illustrates the Markovian model of 
the regulatory network. Genes X and Y are transcribed 
into mRNA X and mRNA Y, respectively, which in turn 
are translated to produce proteins X and Y. The tran¬ 
scription of each of the two genes is suppressed by pro- 
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FIG. 1. Schematic diagrams illustrating the model dynamics, (a) Full model (FM) describing both the mRNA and the 
protein populations; (b) Protein-only model with geometrically distributed (GB) or constant (CB) bursts. The quantity B is 
a geometrically distributed random number with mean B in the GB model, and B — B is a constant in the CB model; (c) 
Protein-only model without bursts (NB); (d) The piecewise deterministic Markov process (PDMP). 


teins of the respective other type via a Hill function 2,3 
H(N) = K[r 0 + r/[l + (N/K ) n ]], where N stands for 
the number of suppressing proteins. The model param¬ 
eter K represents a typical population scale of the pro¬ 
teins, and the parameters r and ro set the minimal (roK) 
and maximal transcription rates ((ro+r)K). The param¬ 
eter n > 0 is the so-called Hill coefficient which models 
the cooperative binding of the repressors 3 . More details 
of the reaction scheme can be found in the Supplemen¬ 
tary Information. Proteins of either type, and the mRNA 
molecules degrade with constant rates 70 and 7 respec¬ 
tively. Biologically, mRNA molecules degrade much 
faster than the proteins do (7 >> 7o) 2,14,18 - The trans¬ 
lation rate of the mRNA is parametrised by 7 B where 
the parameter B is the relative frequency of protein pro¬ 
duction to mRNA degradation. In this parametrisation, 
the number of proteins one single mRNA molecule pro¬ 
duces during its lifetime is a geometrically distribution 
random variable with mean B (see Supplementary Infor¬ 
mation). Biologically the parameter B varies depending 
on the type of product protein 24 . We assume B > 10 
in this work 2,8 to investigate the effect of translational 
bursting. Together with the relatively short lifetime of 
mRNA molecules, this constitutes the origin of ‘trans¬ 
lational bursting’ in the model 14,25 : a relatively large 
number of protein molecules is synthesized in a relatively 
short period of time. 

For simplicity, the process in Fig. la is assumed to be 


symmetric with respect to X and Y, but the analysis is 
easily generalised to asymmetric circuits. In Table I we 
list a set of estimated values of the parameters for the 
model organism E. coli , along with relevant references. 

In the context of this work the model just described 
constitutes the most detailed model we will investigate 
and compare against. It serves as a starting point for the 
derivation of more coarse grained models, and for these 
purposes we will refer to it as the ‘full model’ (FM) in 
the following. 

The FM describes both the mRNA and the pro¬ 
tein populations, hence it constitutes a relatively high¬ 
dimensional system which complicates the mathematical 
analysis. Notably, the only role of mRNA in the FM is to 
generate proteins, and so mRNA can be left out, so long 
as the correct statistics of protein production is retained. 
The timescale separation between the mRNA and protein 
lifetimes leads to the following reduced model describing 
only the protein dynamics. In the limit of infinitely-fast 
mRNA degradation (7 70), proteins are generated in¬ 

stantaneously in bursts of geometrically distributed sizes 
with a mean H, and in between bursting events protein 
populations decay with rate 70. We will refer to the re¬ 
duced model as the GB model (geometrically distributed 
bursts), see Fig. lb 17,24 . In the GB model, the transcrip¬ 
tion rates are regulated via the Hill function exactly as 
before in the FM. 

A further reduction of the GB model involves replac- 
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Parameter 

Description 

Value 

Unit 

Reference 

B 

Average number of protein each mRNA produces 

30 

molecule 

2 

7 

mRNA degradation rate 

30 

1 / (cell cycle) 

2 

7o 

Protein degradation rate 

1.0 

1 / (cell cycle) 

2,9,22 

r 

Maximum suppressed transcription rate 

6/100 

1 / (cell cycle) 

13,23a 

ro 

Basal transcription rate 

1/150 

1 / (cell cycle) 

13,23b 

K 

A typical population scale of the proteins 

200 

molecule 

13,23c 

n 

Hill coefficient 

3.0 

Dimensionless 

9,13,19,23 


a In 13 r = 1.8 and the time unit is defined as the inverse of the protein degradation rate. In our full model we use this value, normalized 
by to the mean burst size B = 30 molecules (r = 1.8/30 = 0.06). 

b j n i3 rQ _ Q 2 After normalising with respect to the burst size 30, we obtain 1/150. In 23 ro = .05r, which is of the same order as 13 . 
c In 13 K is set to be 200 molecules. In 23 only the deterministic dynamics are provided and r + ro = 4.0. To match the protein 
population scale ~ 400 in 13 ’ 22 , we impose rK = 400, resulting in a typical population scale of the proteins K ~ 100 molecules, which 
is of the same order as that of 13 . 


TABLE I. Parameter set. 


ing the geometrically distributed burst sizes by a con¬ 
stant size B. We will call this the CB model (constant 
bursts) 8 . While the reduction of the full model to the 
model with geometrically distributed bursts is well con¬ 
trolled and exact in the limit 7 70, the effects of in¬ 

troducing constant burst sizes are unclear at this stage, 
and require a detailed analysis (see below). 

An even more reduced model is a model with no 
bursts 8-10 , we will refer to this as the NB model. 
The reaction scheme is illustrated in Fig. lc. In this 
model, only one single protein is synthesized when 
a transcription event occurs. We assume a 5 -fold 
increased transcription rate so that the average number 
of proteins synthesized per unit time is consistent with 
the FM, GB, and CB models. 

Only the GB model approximates the station¬ 
ary distribution of the FM. Numerical simulations 
of each of the models are carried out using standard 
methods 26,27 . In the following we present statistical 
properties of the models, leaving typical sample paths to 
the Supplementary Information. Fig. 2 displays the nu¬ 
merically computed stationary distributions for the FM, 
GB, CB and NB models. They illustrate that the pro¬ 
files of protein expressions in different model settings are 
quite distinct. This is due to the different representation 
of the underlying intrinsic noise. 

While the stationary distributions of the FM and the 
GB model are in good agreement with each other, sub¬ 
stantial discrepancies from the full model are found in 
the CB and NB models. In the CB model the station¬ 
ary distribution of protein numbers is very localised com¬ 
pared to the FM and the GB model. In the NB model 
the probability distribution is even more sharply concen¬ 
trated. This is because the NB model misses out two per¬ 
tinent sources of noise. Bursting production in the CB 
model amplifies the stochasticity of transcription events 
and leads to a broadening of the protein distribution. 
Adding randomly distributed burst sizes (GB model) in¬ 


troduces further stochasticity, and diversifies of protein 
numbers even further. 

Based on these results, we conclude that the bursting 
noise introduced by the mRNA populations significantly 
broadens the stationary distribution. In addition, the 
GB model approximates the FM model significantly bet¬ 
ter than the CB and NB models do. We can effectively 
discard the CB and NB models as faithful representation 
of the FM, and our subsequent discussion hence focuses 
mostly on the GB model. 

The GB model approximates the mean first 
switching time of the FM. The toggle switch has two 
dynamic attractors, one in which protein X is highly ex¬ 
pressed and where protein Y has a low concentration, 
and the other with inverted roles by symmetry. Start¬ 
ing from one attractor the switch can be driven to the 
other attractor by fluctuations. The timescale of such a 
transition quantifies the dynamical stability: the longer 
the timescale, the more stable the system is at the ini¬ 
tial position. As we will study next, the way in which 
the bursting production of protein is implemented signif¬ 
icantly affects the timescale of these switching processes. 

Starting from initial condition iVx(0) = n X: 0 and 
7Vy(0) = riyfi, we define the first switching time as 
the time it takes a sample path to reach the symmetric 
boundary Nx = Yy. Mathematically, the first switch¬ 
ing time is a random variable. The mean first switching 
time (MFST) is then the average value of the random 
first switching time. The MFST depends on the initial 
condition (n X: o,n y ^). 

Sweeping across the space of possible initial config¬ 
uration, the MFST of the FM and of the GB model 
are measured in simulations and presented in Fig. 3 . 
We show the MFST of the CB in the Supplementary 
Information. As with the stationary distributions, the 
data in Fig. 3 indicates that the GB model approximates 
the switching times of the full model to a good accuracy. 
We remark that the MFST of the CB model is almost 
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FIG. 2. Stationary distribution of protein numbers, shown 
in the range 0 < Abe, Ny < 700 on a linear scale on both 
axes, (a) FM: Full model describing the mRNA and protein 
populations; (b) GB: protein-only model with geometrically 
distributed bursts; (c) CB: protein-only model with constant 
bursts; and (d) NB: protein-only model without bursts. 



FIG. 3. Mean first switching time as a function of the initial 
protein numbers (0 < iVx, Ny < 700, shown on a linear scale), 
(a) FM: Full model; (b) GB model. 


two coupled Ito stochastic differential equations for the 
concentrations x t = Nx(t)/K and y t = Ny(t)/K. These 
are valid in the limit of large but finite populations 31 and 
of the form 


dx t = v(x t , y t )dt + \JD(xt, yt)dw} x ^, (la) 

dyt = v(y t , x t )dt + y/D(y t , x t )dW t iy \ (lb) 

with drift v and diffusion D given by 


as twice as long as that of the GB and FM models, and 
the switching time in the NB model is longer than 1000 
cell cycles (Supplementary Information). 
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Diffusion approximation of the GB model. The 

evolution of the protein population in the GB model is 
described by a master equation (Supplementary Informa¬ 
tion). Solving master equations mathematically is how¬ 
ever difficult and mostly limited to linear dynamics 24,28 . 
The only realistic way forward for a theoretical analysis 
is often the so-called diffusion approximation. 

In the diffusion approximation, the discrete-molecule 
process is approximated by a Gaussian process for con¬ 
tinuous concentrations—numbers of the different types 
of molecules normalized by a typical population scale. 
The Gaussian process satisfies a diffusion equation (the 
Fokker-Planck equation) 29,30 . Based on these methods, 
it is often possible to calculate or approximate the sta¬ 
tionary behaviour and switching times of model gene 
networks. For existing studies in the context of toggle 
switches see 11-13 . 

Deriving the diffusion approximation of the GB model 
requires modest modifications to the standard Kramers- 
Moyal expansion 29,30 . These modifications are necessary 
to account for the randomness induced by the geomet¬ 
rically distributed burst size. Details of the derivation 
can be found in the Supplementary Information, we here 
only report the final outcome. The expansion results in 


The quantities dW^ and dW represent independent 
Wiener processes. 

The diffusion approximation can only be expected to 
be accurate when molecule numbers are large so that the 
concentations x t and y t are effectively continuous. In 
principle, a similar analysis can also be applied to the 
master equations of the full model. In the FM mRNA 
numbers are rather small though (typically < 5, see Sup¬ 
plementary Information), so the Gaussian approximation 
does not capture the statistics of the intrinsic noise faith¬ 
fully. Similarly further analysis of the CB and NB models 
can be carried out based on the diffusion approximation. 
Given that CB and NB models fail to reproduce the be¬ 
haviour of the FM, these results are relegated to the Sup¬ 
plementary Information. 

Results from simulating the Gaussian process of 
equations (1) are shown in Fig. 4. While the data for 
the stationary distribution (Fig. 4a) looks similar to 
that of the full model (Fig. 2a), noticeable discrepancies 
are manifest in the mean first switching times (compare 
Fig. 4b and Fig. 3a). In Fig. 4c and d, we show the 
differences between simulation outcomes of the full 
model and those of the diffusion approximation of the 
GB model. Although the GB model itself approximates 
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FIG. 4. Diffusion approximation of the protein-only model 
with geometrically distributed random bursts (GB); (a) Sta¬ 
tionary distribution as a function of the protein numbers; (b) 
Mean first switching time (MFST) as a function of the initial 
protein numbers, in the unit of cell cycles; (c) Net deviation 
of the stationary distribution from the full model; (d) Net 
deviation of the MFST from the FM. All axes show the range 
0 < Nx, Ny < 700 on linear scales. 


the full model well (Figs. 2 and 3), we conclude that 
the diffusion approximation fails to capture the relevant 
model outcomes. 


Constructing a mesoscopic piecewise determinis¬ 
tic Markov process. We have seen that the diffusion 
approximation of the GB model fails to reproduce the 
statistics of the full model. This underlines the need 
to construct coarse-grained models directly from the full 
model and without the intermediate step of a protein- 
only dynamics. We now proceed to introduce such a 
model. As before we describe protein concentrations by 
continuous variables, x and y. The mRNA dynamics 
are captured by introducing three ‘states’: The 0-state 
describes phases in which no mRNA is present. In the 
X-state there is one mRNA of type X and protein X is 
generated with rate 7 h. The quantity b = B/K is the 
mean burst size in the unit of protein concentration. No 
proteins of type Y are produced in the X-state. Similarly, 
in the Y-state protein Y is generated with rate 7 b. Both 
types of protein are subject to natural degradation with 
rate 70 in any of the three states. 

This is described by the following deterministic differ- 



FIG. 5. PDMP approximation, (a) Stationary distribution; 
(b) Mean first switching time in the unit of cell cycles as a 
function of initial protein numbers, (c) Net deviation of the 
stationary distribution from the full model; (d) Net deviation 
of the MFST of the PDMP model from the FM. All axes are 
on linear scales and show the range 0 < Nx, Ny < 700. 


ential equations: 


0 -state: 

x = — 70 ^ and 

y = -70 y, 

(3a) 

X-state: 

X = 76 - 7 o£ and y = -70 y, 

(3b) 

Y-state: 

x - — 70 X and 

y = ifb- 70 y- 

(3c) 


The rates with which the system transits between the 
states are based on the dynamics of the FM: 

0-state H(<Ky \ X-state, X-state 0-state, 

0-state h(kKx \ Y-state, Y-state 0-state. (4) 

No transitions occur directly between the X and Y states. 
The kinetic scheme is illustrated in Fig. Id. 

The stochasticity and discreteness of the mRNA pop¬ 
ulations is reflected in the random transitioning between 
the 0-, X- and Y-states. Between these Markovian events 
the protein concentrations evolve deterministically. We 
will refer to this model as the piecewise deterministic 
Markov process (PDMP). 

Notably, at most one mRNA molecule of either type 
can be present in PDMP at any time. Although the 
model can be generalised to allow more than one mRNA 
molecule, the analysis below shows that the lowest-order 
approximation is sufficient to capture the relevant 
fluctuations of the mRNA dynamics. 



























6 


The PDMP approximation outperforms the dif¬ 
fusion approximation of the GB model. As in the 

GB model, we work in the limit of infinitely fast degrad¬ 
ing mRNA (7 00 ). Simulations of the PDMP model in 

this limit can be carried out using a minor modification of 
a previously proposed algorithm 15 . We measure the sta¬ 
tionary distribution of the PDMP model and the mean 
first switching times for different initial protein numbers. 
Results are shown in Fig. 5a and b, and we compare the 
outcome against that of the full model in Fig. 5c and d. 

The simulation data indicate that the PDMP ap¬ 
proximation outperforms the diffusion approximation 
of the GB model, and it provides a more faithful 
approximation to the FM. This is because the diffusion 
approximation introduces Gaussian noise. It retains 
some information about the variance of protein pro¬ 
duction and degradation, but it does not capture the 
geometrically distributed burst sizes in the GB model 
well enough. The PDMP approximation, on the other 
hand, models exponentially distributed bursts in protein 
concentration. The exponential distribution in the 
PDMP model is the analogue of the geometric distri¬ 
bution in the discrete-molecule GB model. While the 
PDMP model is an approximation as well, it retains the 
typical characteristics of the stationary distribution and 
switching times of the original model. At the same time 
the PDMP model is suitable for further mathematical 
analysis (see below). 

When does the PDMP outperform the diffusion 
approximation? We now investigate the robustness of 
these findings. In Fig. 6 we vary two essential parame¬ 
ters, the mean burst size £?, and the population scale K , 
while keeping the other parameters fixed. We measure 
the Jensen-Shannon distance 32,33 between the resulting 
stationary distributions of the PDMP and that of the 
the full model. Data is shown in Fig. 6 a and c. We also 
compare the mean first switching times starting from one 
of the stable modes, see Fig. 6 b and d. The figure also 
shows results from the diffusion approximation of the GB 
model. 

Results indicate that PDMP model outperforms the 
diffusion approximation of the GB model for mean 
burst sizes of B > 5. We conclude that the bursting 
noise has to be considered in this biologically relevant 
regime 24 . The PDMP model incorporates only the 
bursting noise and neglects the demographic noise from 
random degradation of the proteins. The strength of this 
demographic noise is proportional to 1/ y/K. The results 
in Fig. 6 c and d indicate that the difference in describ¬ 
ing intrinsic noise propagates to physical observables 
even when the noise is weak (K ~ 1000 for fixed B = 30). 

Analytic investigation of the PDMP process. The 

simplicity of the PDMP approach allows us to proceed 
with a mathematical analysis. We here only outline the 



K K 


FIG. 6. Performance of the PDMP model and the diffusion 
approximation of the GB model (DA-GB). (a) Jensen-Shanon 
distance between the stationary distribution PDMP (and DA- 
GB) and the stationary distribution of the FM; (b) Mean first 
switching time for varying value of B at fixed K = 200; (c-d) 
Similar to (a-b) but now varying K at fixed B — 30. 


main steps, further details are reported in the Supple¬ 
mentary Information. We denote the probability den¬ 
sity that the system is in the 0 -state and with protein 
densities x,y at time t by po(x,y,t). Similarly we write 
px{x,y,t) and py(x,y,t) when the system is in the X- 
or Y-states. The evolution of these distributions then 
follows the forward equation 


d_ 

at 


Po 

Px 

Py 



(5) 


where l) d and L\ drive the deterministic flow and the 
random switching between states respectively. These op¬ 
erators are of the form 



L\ := 


(^)„ ° ° 

0 (^L 0 • 

. 0 0 KU 

' —H(Kx) - H{Ky) 7 7 

H(Ky) -7 0 

H(Kx) 0 -7 


( 6 a) 


( 6 b) 
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with 


(■C'd) n := lod x ( x ) + 70 d y (y) , ( 7 a) 

(■C'd) 22 :=d * (“ 7 ^ + 7 o#) + 70 ^ 3 , (y) , ( 7 b) 

(-^d) := 7 o< 9 ® (#) + d y (-7 b + 7 0 y). ( 7 c) 

The differential operators < 9 X and act on all that follows 
to their right, including the probability densities po,Px 
and py outside the matrix notation in equations ( 5 ) and 
( 6 ). 

The PDMP approximation applies in the limit 7 —> 00, 
i.e., fast return into the 0 -state. The resident time in the 
X- and Y-states is exponentially distributed and scales 
as 7 -1 . It formally tends to zero as 7 —00. On the 
other hand the translation rate 7 B tends to infinity in 
this limit. Combining the limiting behaviours of resident 
time and translation rate results in an exponentially dis¬ 
tributed increment of protein concentration in each cy¬ 
cle of switching from the 0 -state to the X- or Y-state, 
and then returning to the 0-state. As a consequence 
the PDMP converges to previously proposed continuous- 
state bursting models 14,15,18 in the limit 7 —>> 00, and 
Po(x,y,t) satisfies 

dtPo = d x (70XP0) + d y (702/P0) - [H(Kx) + H(Ky)]p 0 

f X 1 x-x' 

4 H(Ky) J -e~^p 0 (x',y,t)dx' 

+ H(Kx)f y ^e- v -^p 0 {x,y',t)dy', (8) 



FIG. 7. Theoretical prediction of the PDMP model, (a) 
Mean first passage time as a function of initial protein num¬ 
bers, calculated from the backward equation (12); (b) Sta¬ 
tionary distribution of protein numbers calculated from the 
WKB method. Axes of both panels show the range 0 < 
iVx(O), Yy(0) < 700 on linear scales. 


(6). They are given by 


L d := 

L s := 


Ud) u 0 0 

0 {Ld) 22 0 i 

0 0 {Ld) 33 _ 

—H(Kx) — H(Ky) H(Ky) H(Kx) 
7 —7 0 

7 0 —7 


with 


(10a) 

(10b) 


{Ld) 11 = - loxd x - 70 ydy, (11a) 

{Ld) 2 2 = ( 7 & - 70 #) d x - 70 (y) d y , (lib) 

{Ld) 33 = - 7oxd x + (7 b - 70 y) d y . (11c) 


as detailed in the Supplementary Information. 

Analytic investigation of the mean first switching 
time. One of the strengths of the PDMP formulation 
(equations ( 5 ) and (6)) is the relative ease with which 
mean first switching times can be obtained. We first 
proceed by computing mean escape time from an arbi¬ 
trary open domain fh The mean first switching time can 
be calculated by setting Q = {(x,y) : x < y}, recognising 
that the process can only exit this domain by crossing 
the boundary x = y. 

Suppose, the system is initially at (x,y) G D, and in 
state Z G { 0 ,X,Y}. We write Tz(x,y) for the mean 
first time at which the process exits the domain Q. 
The quantities Tz then satisfy the following backward 
equation 29,30 


" 1" 


Tq{x, y) 

1 

— (L d + L s ) 

Tx{x,y) 

1 


_T Y (x,y) _ 


where L d and L s are adjoint to the operators in equations 


In the infinitely-fast degrading mRNA limit (7 00), 

and using appropriate boundary conditions (Supplemen¬ 
tary Information) we arrive at 


-1 = [~7 0 xd x - 70 yd v - H(Kx) - H(Ky )] T 0 (x, y) 


+ H(Ky) I* 

J X 


— T 0 {x',y)dx' 



_ y -y 

e b 


T 0 (x,y')dx f . 


( 12 ) 


This is the adjoint equation 29 of the expression in equa¬ 
tion (8) on the open domain D. Equation ( 12 ) is solved by 
a finite difference method, noting that it is self-consistent 
and no boundary condition needs to be specified. The so¬ 
lution is shown in Fig. 7 , and reproduces the simulation 
outcome of the FM well. 

We remark that equation (12) is only valid for the 
half-plane fh A detailed discussion can be found in the 
Supplementary Information. 


Analytic investigation of the weak-noise limit. 

The analytical calculation of the stationary distributions 
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of the PDMP model can be pursued further using the so- 
called Wentzel-Kramers-Brillouin (WKB) method. This 
technique is based on the ansatz 


Pstat {x, y) = exp 


oo 

-T^2 be Se{x,y) 

. 0 1=0 


(13) 


where b = B/K <7 1. One proceeds by considering 
{b\ + Ll'j p s tat(x, y) = 0 order-by-order in b. To leading 
order we find the Hamilton-Jacobi equation 


0 = [ 70 a; - Bh (y))]d x S 0 + [70 y ~ Bh (x)] d y S 0 
+ [ 70 a; + 70 y ~ Bh (x) - Bh (; y )] (d x S 0 ) (d y S 0 ) 

+ lox (d x S 0 ) 2 + 70 y (dyS 0 ) 2 
+ lox (d x S 0 ) 2 (dySo) + 70 y 0 d x s 0 ) {d y S 0 f , (14) 

where h(z) := H(Kz)/K. This equation is then nu¬ 
merically solved using the algorithm of Heymann and 
Vanden-Eijnden 34 . Results are shown in Fig. 8. Even 
though this only provides a first-order approximation and 
despite the fact that we have used b = 0.15 (which is not 
very small) we obtain a reasonable agreement with the 
stationary distribution in Fig. 5a. 

For completeness we have also carried out a WKB anal¬ 
ysis of the diffusion approximation of the GB, CB and 
NB models. These are presented in the Supplementary 
Information. 

The leading order function So (x, y ) is the so-called 
‘rate function’ which quantifies the rare-event statistics 
of the process in the weak-noise limit b <C l 35 ’ 36 . Several 
studies have suggested that So (x, y) is a suitable can¬ 
didate for a ‘landscape’ of the non-equilibrium random 
processes in models of gene regulatory networks 9-11,16,17 . 
The Hamilton-Jacobi equation (14) contains cubic terms 
such as (d x So ) 2 ( d y So ), while diffusion equations are 
quadratic in derivatives of So- This illustrates the funda¬ 
mental difference between the statistics of intrinsic noise 
in the diffusion approximation and the bursting noise in 
PDMP. Further more rigorous mathematical investiga¬ 
tions into these differences would be very welcome in our 
view. 

We compare the functions So of the PDMP and the 
diffusion approximation of the GB model in Fig. 8. 
One observes a much ‘shallower’ rate function in the 
PMDP model, especially at larger protein numbers 
(N X ,N Y ~ 700). This is due to the long tails in the 
exponential bursting kernel of the PDMP model, which 
are not present in the diffusion approximation of the 
GB model. Such a fat-tail bursting kernel enhances 
the probability for the system to evolve to high protein 
concentrations. We identify this as the origin of the 
qualitatively distinct rare-event statistics in the two 
models. 


a b 



FIG. 8. Rate functions So as functions of the protein numbers 
0 < N X ,N Y < 700 on a linear scale, (a) PDMP model; (b) 
Diffusion approximation of the GB model. 


Effects of bursting noise in a multi-switch net¬ 
work. Recently, multi-switch systems have gained 
interest 13,21,37 . A schematic diagram of the three-way 
switch network proposed by 13 is shown in Fig. 9a. It 
is obtained from the classical toggle switch network by 
including a self-enhancing autoregulation. Our computa¬ 
tional and mathematical setup requires only minor mod¬ 
ifications to include generalisation to this case. Specifi¬ 
cally, we replace the earlier Hill functions by 

c ( iVx,iv T )- w ( 1 + (jVx/ ;; ) ., + i ) 

X ( 1 + (JV Y /K 2 )»! + l)’ (15) 

with parameters 13 qo = 4, n = —4/5, 7*2 = 7/3, n\ = 3, 
ri 2 = 1, K\ = 160, and K 2 = 320. The rest of the param¬ 
eters follows Table I. The negative value of r± reflects the 
positive autoregulation. To evaluate the effects of burst¬ 
ing noise on this multi-switch model, we consider again 
the full model, the diffusion approximation of the GB 
model, as well as the CB and NB models of the extended 
network. 

Fig. 9 displays the stationary distribution to illustrate 
the effects of the bursting noise in the multi-switch net¬ 
work. The model without bursts (NB, panel f) has a 
stationary distribution consisting of three modes, as re¬ 
ported earlier 13 . Inclusion of constant bursts (CB, panel 
e) diversifies the protein expression and reduces the sta¬ 
bility of the mode located at N x = N Y ~ 230. In the full 
model (panel b) there is no discernible concentration of 
probability in the symmetric mode, hence the three-way 
switching capability appears to be absent. We also notice 
that the saddle of the distribution in the FM is located 
at a state with much lower number of proteins compared 
to the NB and CB models. The most likely switching 
path 9 from one of the asymmetric modes to the other 
will differ significantly between the different variants of 
the model. The diffusion approximation of the GB model 
(panel c) does not capture the outcome of the FM either. 


















9 


Overall these findings confirm again that the inclusion 
of bursty noise statistics has significant effects on the 
model outcome. Finally, we observe in Fig. 9d that the 
PDMP model approximates the full model of the three- 
way switch well. We conclude that randomly distributed 
burst sizes are again the predominant form of intrinsic 
noise in the multi-switch network. 


DISCUSSION AND CONCLUSION 

Explicitly including mRNA dynamics in gene regula¬ 
tory models inevitably introduces more complexity. We 
have quantitatively studied the effects of bursting noise 1 
in a biologically relevant regime or the model organism 
E. coli. To our knowledge, this is one of the first which 
attempts to build a rigorous connection between existing 
individual-based models 3,8 10 and more coarse-grained 
models 11,12,14,15 . Results of our simulations indicate that 
the bursting statistics of transcription and translation are 
essential ingredients of models of gene regulation. Coarse 
grained models need to account for bursting to retain 
correct statistics of noise-driven phenomena such as the 
switching between different dynamic attractors. 

The implications of our observations are relevant to 
the abstract modelling of regulatory networks in differ¬ 
ent ways. We are now in a better position to address 
our opening question, and to say how noise propagates 
between different levels of modelling. Perhaps more im¬ 
portantly, our study may ultimately help to decide what 
level of modelling is most appropriate to study gene regu¬ 
latory circuits computationally. The answer will of course 
depend on the question in the focus of the investigation. 
We have examined different levels of coarse graining, and 
we have identified the steps in these reduction procedures 
at which significant alternations to different model out¬ 
comes are introduced. 

Systematically choosing a suitable level of coarse- 
graining also facilitates the mathematical analysis of reg¬ 
ulatory networks. The high dimensionality of full regula¬ 
tory network effectively makes them intractable. Model 
reduction is needed to make progress, and our analysis 
demonstrates that the PDMP formulation is a powerful 
way forward, and that it can be more suitable than the 
conventional diffusion approximation. The PDMP model 
explicitly retains the bursting noise originating from the 
mRNA dynamics. Even though it effectively disregards 
the demographic noise from random degradation of the 
proteins, it delivers accurate predictions for stationary 
distributions and switching times. 

As another strength, the PDMP formulation can rel¬ 
atively easily be generalised to accommodate more com¬ 
plex reactions. For example, in the Enterobacteria phage 
A switch it is not the monomer of the synthesized proteins 
which acts as the repressor to regulate transcription, but 
instead their dimer. Modelling these processes requires 





1 


'0 


NB 


<K 


N x 


y. 


FIG. 9. (a) Schematic diagram illustrating the network of the 
three-way switch, remaining panels show stationary distribu¬ 
tion of protein numbers in the range 0 < Nx,Ny < 700 on 
linear scale, (b) Full model; (c) Diffusion approximation of 
the GB model; (d) PDMP approximation; (e) CB model; and 
(f)NB model. 


the inclusion of dimerization further downstream after 
transcription and translation 7,10 . Preliminary results not 
shown here reveal that the PDMP approximates such dy¬ 
namics well. 

The fact that the piecewise deterministic Markov pro¬ 
cess is successful in approximating the full model opens a 
relatively new type of modelling paradigm. We acknowl¬ 
edge that we are not the first to propose this 28,38 41 . Our 
contribution consists in a first analytical treatment of 
PDMP models and in an systematic embedding into a 
wider landscape of modelling approaches. The bursting 
phenomenon is ubiquitous whenever there is a separa¬ 
tion of time scales between the source and the product 
of a biological process. These are mRNA and protein in 
models of gene regulation, but we expect that these ideas 
can be applied to other biological problems with similar 
time-scale separation. 
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METHODS 

Sample paths of the individual-based processes (FM, 
CB, NB, and GB) are generated using the standard Gille¬ 
spie algorithm 26,27 . The PDMP process is simulated us¬ 
ing the algorithm discussed by Bokes et al 15 . Simulations 
of the stochastic differential equations resulting from the 
diffusion approximation are performed with a standard 
Euler-Maruyama scheme. The geometric minimum ac¬ 
tion method 34 is implemented using MATLAB R2010a, 
as is the finite-difference scheme to solve the backward 
equation (12). Further details can be found in the Sup¬ 
plementary Information. 
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I. NOTATION 

We briefly summarise the notation used in the main manuscript and in this supplement: 

• Discrete numbers of the two types of protein are denoted by Nx and Ny. We write Mx and 
My for the number of mRNA molecules of the two types. 

• Variables such as x(t ) denote continuous particle densities (or concentrations). Specifically 
x(t) and y(t) are protein densities, i.e., x = Nx/K and y = Ny/K in the limit K 1. 

• We denote the probabilities in the master equations by capital P (discrete particle numbers). 

• The lower-case notation p is used for probability density functions in the diffusion approxima¬ 
tion (continuous particle densities/concentrations). 

II. MASTER EQUATIONS OF THE INDIVIDUAL-BASED MODELS 

The different individual-based model in the main manuscript are uniquely defined by their master 
equations. Here we briefly summarise the master equations of the FM and of the GB, CB and NB 
models. 

A. Master equation of full model (FM) 

We write P a ,b,c,d for the probability that the system is in state Mx = a, My = 6 , Nx = c, Ny = d 
at time t. The master equation of the FM is then 

^j_Pa,b,c,d = — {H ( c ) + H (d) + ay [1 + B] + 67 [1 + B] — 70 c — yod} P a ,b,c,d 

+ H (c) P a ,b-l,c,d + H (d) Pa-l,b,c,d + 7 (a + 1 ) Pa+l,b,c,d + 7 (b + 1 ) P a ,b+l,c,d 
+ ByaP a ^ ?c _i^ + B'ybPafi^d-i + 7o (c + 1 ) P a ,b,c+i,d + 7o (d + 1 ) Pa,b,c,d- 1-1 • (i) 

The probability of a state is zero if any of the variables a, 6 , c or d are negative. 

B. Infinitely fast-degrading mRNA limit: the GB model 

In the kinetic scheme of the full model the mRNA decays with a rate 7 , but synthesizes a protein 
with a rate 7 B. Both of the rates are constants. One an mRNA is created the next event involving 
this mRNA particle is either the production of protein or the decay of the mRNA molecule. The 
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probability that the next event is the synthesis of a protein is B/(B + 1), and the probability that a 
decay occurs next (before production of a protein) is 1/(B + 1 ). The random number, £, of protein 
molecules generated by one particular mRNA molecule during its lifetime then follows a geometric 
distribution 



( 2 ) 


The lifetime of an mRNA molecule is of order 0 ( 1 / 7 ). As a consequence, we can think of the 
protein-generating process as follows in the infinitely-fast decaying mRNA limit (7 00 ): As soon 

as an mRNA is transcribed, it immediately releases a random number of proteins l drawn from the 
distribution ( 2 ) and then decays. 


C. Master equation of the GB model 

In the limit 7 00 , the dynamics of the full model can effectively be coarse-grained into a 

single-species model 


Gene X — ^ Y \ £ x Protein X (transcription and translation of protein X), 


Gene Y 


H(N X ) 


> l x Protein Y (transcription and translation of protein Y), 


Protein X 0 (degradation of protein X), 


Protein Y 0 (degradation of protein Y), 


(3a) 

(3b) 

(3c) 

(3d) 


where l is drawn from the above geometric distribution, every time one of the first two reactions 
fires. The master equation of this process is 


~^j_Pc,d — ~ [H (c) + H (d) + 70c + 70 d] P c ^ + 70 (c + 1) P c +i,d + 7 o (d + 1) P c ,d+ 1 

+ pw (ifs)'(rb) (^)' (^) p c4 ., (4) 


We have written P c ,d(f) for the probability that the system is in state Nx = c, Ny = d at time t. 
Again, the probability of a state is zero if c or d are negative. 
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D. Master equation of CB model 

The master equation for the CB model is obtained by replacing g(£) 5^^, be., £ takes value 

£ = B with probability one. We find 

j t Pc,d = - [H (c) + H (d) + 70 c + 70 d] P c4 

+ H (d) P c -B,d + H (c) P c ,d-B + 7o (c + 1) P c +i,d + 7o (d + 1) P c ,d+ 1 - (5) 

E. Master equation of the model without bursts (NB) 

In this case we have 

j t P c4 = - [BH (c) + BH (d) + 7oc + 70 d] P c4 

+ BH (d) P c -\,d + BH ( c ) Pc,d -1 + 7o (c + 1) Pc+ijd + 7o (d + 1) ^c,d+i- ( 6 ) 

III. DERIVING THE DIFFUSION APPROXIMATIONS 

A. Diffusion approximation of the GB model 

Simulations of the full model (FM) show that the number of mRNA molecules present at any 
one time is typically very small (Mx, My < 10) when biologically relevant parameters are used. The 
conventional diffusion approximation relies on large particle numbers, and so it is not adequate for 
the full model. 

Instead, we perform the diffusion approximation to the master equation of the GB model, equation 
(4). This is a standard method, and we proceed along the lines of [1]. The only complication is the 
presence of the geometrically distributed random numbers (denoted by £) in the protein-generation 
reactions. As we will discuss below this requires only modest modifications to the standard Kramers- 
Moyal expansion. 

We assume that the scale of the population size, A, is large but finite, i.e., K 1, and we write 
x = Nx/K and y = Ny/K, and replace P c ,d(t) in favour of p(x,y,t). The master equation (4) then 
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becomes 


dtp(x, y,t) = - [H (Kx ) + H (Ky ) + 7 oifx + 70 ^ 2 /] p (x, y, t) 

+ loK ^x + —^ p ^x + —, y, t'j + -)qK ^y + —^ p /r, y + —,t 

+|>(*»>(rf b) (rrB) f ’( a: “S' ! '’ t 


( 7 ) 


In the last two terms we have extended the summation over £ to infinity, terms in which x — t/K 
or y — £/K become negative are automatically suppressed as the corresponding probabilities p(x — 
£/K,y, t) and p(x, y — £/K,t) vanish. 

The above expression can then be written as 


dtp(x, y, t ) = - [H (Kx) + H (Ky) + 7 0 Kx + 70 Ky] p (x, y, t) 

+70K ( x + p ( x + + ^ K (y + j^) p ( x ^y + ^’ t 

+ H (Ky) (p (x - -f, y, +H (Kx) ^p /r, y - -f, t 


( 8 ) 


where (•••)/ denotes an average with respect to a geometrically distributed random number £, i.e., 

(ft)t = (1 + B) 1 YU ( itb ) ft- 

We next expand the above equation in powers of 1/if, keeping only the leading and sub-leading 
order terms [1]. We also use the explicit expressions (£)# = B and (£ 2 )i = B(2B +1) for the first two 
moments of the geometric distribution. 

We arrive at the Fokker-Planck equation 


dtp(x, y,t)= - d x 


—H (Ky) - 7 0 x 


P (x, y,t)> - d, 


—H (Kx) - 70 y 


p(x,y,t) 


2 K dx 

2K d y 


B(2B + l) 

K 

B(2B + 1) 
K 


H (Ky) + 7 0 x 
H (Kx) + 7 0 y 


P (x, y, t) 
p(x,y,t) J-, 


( 9 ) 


where we have written d x = ^ and similarly for d y . Realisations of the random process described 
by the Fokker-Planck equation (9) can be obtained as the solutions of the coupled Ito stochastic 
differential equations 


dx t = v(x u yt)dt + \JD(x u y t )dWf\ 
dy t — v(x t , yt)dt + ^/D(^~xi)dW^ v) 


t > 


( 10 a) 

( 10 b) 



with the defined drift v and diffusion D given by 


v(w, z) := B ( r 0 + 


D{w,z) := 


1 + z n 

(2 B + 1)( ro + 


low, 

r 

1 + z r 


+ -glow 


(x) (y) 

The quantities ’ and are independent Wiener processes. 


(lla) 

(lib) 


B. Diffusion approximation of the CB and NB models 

The same procedure can be applied to the master equation of the CB and NB models, and for 
completeness we report the resulting Fokker-Planck equations. 

For the CB model one finds 


d t p(x, y,t) — - d x 


B 


—H ( Ky) - j 0 x 


p (x, y,t) \ - a 


—H (. Kx ) - 70 y 


p(x,y,t) 


2K° X 

A* 

2 k v 


^H(Ky) + -^ 702 ; 


K 

1 ?H(Kx) + -t 7o y 


K 


p(x,y,t) | 
p(x,y,t)\, 


( 12 ) 


and for the model without bursts (NB) one has 


d t p(x, y,t)= - d x 


j^BH (Ky) - jox 


p (x, y,t)\ - a 


g BH (Kx) - 70 y 


p(x,y,t) 


2 K dx 

2K°y 


j^BH (Ky) + 7 0 x 
j^BH (Kx) + 7 0 y 


p(x,y,t) 

p(x,y,t) 


(13) 


IV. THE PIECEWISE DETERMINISTIC MARKOV PROCESS (PDMP) 


A. Construction of the PDMP model 


In this section we outline the construction of the PDMP approximation, starting from the full 
model. For this purpose it is useful to introduce the notation 

P^ b) (c, d, t) = P(M X = a, My = b, N x = c,N Y = d, t). (14) 

Thus the upper indices (a, b ) denote the number of mRNA molecules of either type in the system, 
and the arguments a, b stand for protein numbers. 
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The master equation (1) can then be written in matrix form 


P( 0,0 ) (c, d, t) 


P( 0,0 ) (c, d, t) 

P( 1,0 ) (c, d, t) 

II 

P( 1,0 ) ( c , d, t) 

P^ 0 ’ 1 ) (c, d, t) 

P^ 0 ’ 1 ) (c, d, t) 


We have introduced 


£ T := 


£t(o,o) _ H 7 

P(d) £t(b°) _ h(c) - H(d ) 

P(c) 0 


P(c) - i/(d) - 7 


, ( 16 ) 


where the operator describes the forward evolution of protein numbers when there are Mx — 

m and My = n mRNA molecules in the system. Specifically, 

£t(”».") = m-yB - l) + nyB (f 0 ’" 1 - l) + 7o [(.S 0 ’ 1 - l) + (S 0 ’ 1 - l)] , (17) 

where are the shift operators [1] acting on functions of protein numbers. They are defined through 

£ ZJ /(m, n) = f(m + i, n + j ). (18) 

Next we consider the limit of fast mRNA decay, that is, large values of 7. More specifically 
mRNA molecules of either type are generated with rates H(Ny) and H(Nx) respectively, and we 
assume that 7 is much larger than either of these two rates (7 H(Ny ), 7 H(Nx), for any values 
of Nx and Ny). In this limit, the system is almost always in the state without mRNA molecules 
(i.e.,Mx = 0, My = 0), except for short spells during which there is either one molecule of mRNA 
of type X, or one of type Y. The duration of the episodes spent in these (1,0) and (0,1) states is of 
order 7 -1 , then a switch back to the (0, 0) state occurs. The probability to find the system in states 
with Mx > 1 or My > 1 is even smaller, specifically of order (id/7) 2 , and we neglect contributions 
from these states. Equation (15) can then be simplified into a forward equation of a three-state 
model: 


p(°>°> (c, d, i) 

p(°>°> (c, d, i) 

P ( 1 ’ 0 ) M,i) = 4pprox 

pt 1 ’ 0 ) (c, d, i) 

pC 0 , 1 ) ( c _ d, t) 

pi 0 , 1 ) ( c _ d, t) 


£t(0,0) _ p( c ) _ 7 7 

H(d) £t(i,o) _ 7 0 

H(c) 0 £t(°,i) _ 7 
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In the next step we consider the limit of large values of K. Formally we take the limit K —>• oo. 
The system can then be described by the protein concentration x = Nx/K and y = Ny/K. The 
corresponding probability distributions in the continuum limit are 

p 0 (x, y ) := p(°>°) (Kx, Ky) K 2 , (21a) 

px(x, y) := P < 1 >°) (Kx, Ky) K 2 , (21b) 

Py(x, y) := P ((u) (Kx, Ky) K 2 . (21c) 

On the left-hand side we have introduced the notation 0, X and Y to describe the states in which 
there are no mRNA molecules (My = My = 0), one mRNA molecule of type X (My = 1, My — 0) 
and one mRNA molecule of type Y respectively (My = 0, My = 1). This is in-line with the notation 
in the main manuscript. 

The time evolution of the protein concentrations between the random switching events between 
these three states is then taken to be deterministic. Mathematically this corresponds to expanding 
the discrete operators [ n powers of K _1 , and keeping only the lowest-order advection terms. 

This generates so-called Lionville operators, and leads to 


d 

dt 

Po 


Po 

Px 

= (d+d) 

Px 


PY 


PY 


( 22 ) 


where and L\ are the forward operators driving the deterministic flow and the random switching 
between states, respectively. They are given by 



Li := 


«>„ ° ° 

0 ( L iL 0 • 

0 0 ( L d) 

\ a J 33 J 

—H(Kx) — H(Ky) 7 7 

H(Ky) -7 0 

H(Kx) 0 -7 


(23a) 


(23b) 


(T) u lod x (•'•) + 7o d y (y), (24a) 

( L d) 22 := ^ (~^ b + 70 ®) + 70 d y (y), (24b) 

( L d) 33 := (x ) + d y (-7 b + 70 y). (24c) 


and 
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Nature of the approximation 

In deriving Eq. (22) we have made several assumptions and approximations: 

(i) First, we have assumed that 7 /H ^ 1 , where H stands for the maximum value H(Nx) 

and H(Ny) can attain. We recall that H(N) = K ro + • The function h(pc) = 

r o + r/(l + x n ) does not involve K or 7 , and its maximum value is ro + r. In dimensionless 
units, the assumption 7 /iif 1 is thus fulfilled if 7 (r 0 + r)K. 

(ii) We have replaced the discrete operators by deterministic Liouville operators, i.e., we 

neglected demographic stochasticity of the protein degradation. The purpose of this is to isolate 
the contribution of the bursting noise, originating from the random switching of the mRNA 
state (0,X and Y). Making the deterministic approximation for the protein concentrations is 
formally valid only in the limit of very large protein populations, K 1 (K sets the scale of 
the numbers of protein molecules). 

In summary we assume 7 (ro + r)K and K ^ 1. We expect our approximations to be accurate 
when both of these are fulfilled, in particular the typical value of 7 above which our theory can be 
expected to be accurate will depend on the choice of if, which in turn must be chosen large enough 
to justify the deterministic approximation of the protein dynamics. 

The data in the main manuscript reveals that the mathematical approximation agrees well with 
simulations for 7 = 30 and K = 200. In our simulations we use ro ~ 0.007 and r = 0.06. 


B. Forward equation in the limit 7 — >> 00 

We start from 

dtPo = [lod x x + 70 d v y - H(Kx ) - H(Ky)] p 0 + 7 Px + IPy 
dtPx = [ d x (j 0 x - 7 b) + 70 d y -7 ]px + H ( Ky) p 0 , 
dtpy = [d y ( 70 y - 7 b) + 7(A - 7 ] Py + H (Kx) p 0 - 


(25a) 

(25b) 

(25c) 
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Applying the operator d y (^-y — b'j + — 1 d x (^-x — b^j + ^<9 y — 1 to both sides of equation 

(25a) results in 


7o 


7o . 




<9* ( —^ - b ) + —<9 y - i 

7 J 7 . 


7o . 




[%>x - if (A7/)p 0 ] + 


Po 

7o 


7o . 


a* ^ - H + -1 

7 7 7. 


+ 


7o 


7 


7o , 


«y 


7 


<9* ( —x - b ) + —<9 y - i 

7 7 7. 


[%> Y -tf(i^)po] 

[70^ + - if (iTr) - if(ify)] po¬ 

pe) 


We note that this equation is not closed in po- 

Next, we take the 7 —^ 00 limit, keeping in mind that if and 70 are finite. The system then 
almost-snrely stays in the 0-state, and consequently px,PY 0- Equation (26) then reduces to 

d t (- bd y - 1) (~bd x - l)p 0 = ( bd y + 1) [H(Ky)po\ + (bd x + 1) [H(Kx)p 0 ]. (27) 

The inverse operator of 1 + bd z is 


(1 + bd z ) 1 f(z) = j - ^ f(z')dz, 


(28) 


and so equation (27) turns into the ‘forward equation’ presented in the main text: 
dtPo = d x (joxpo) + d y ( 702 /po) - [H(Kx) + H(Ky)]p 0 

+ H(Ky)J ie _iL Ypo (x',y,t) dx' + H(Kx) J ^e~^p 0 (x,y',t) dy' . (29) 


C. Equations for the mean first switching time 


Here we illustrate the detail derivation to the adjoint equation in the main text. We focus on 
initial conditions y > x and our goal is to calculate the mean time it takes the dynamics to reach 
states with x — y. We write Tz(x,y) for the time it takes the dynamics to reach a state in which 
x = y if started from initial condition x,p, and in mRNA state Z £ {0,X, Y}. The Tz(x,y) then 
satisfy the following adjoint equation [ 1 ] 


1 


To(x,y) 

1 

— (Ld + L s ) 

Tx{x,y) 

1 


_ Ty(x, y) _ 


where Ld and L s are the adjoint operators of L y and L\. They are given by 


(L d ) n 

0 

0 


- H(Kx)-H(Ky ) H(Ky ) H(Kx ) 

0 

(I'd) 22 

0 

and L s = 

7 —7 0 

0 

0 

(I'd) 33 _ 


7 0 —7 


(30) 


(31) 
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with 


(Ld) 11 = - loxd x - 'foydy, 

(Ld) 22 = (7& - 7o^) d x - 70 (y) <9 y , 
(id) 33 = “ 7oa:<9x + (76 - 7 0 y) <9 y . 


(32a) 

(32b) 

(32c) 


In the infinitely fast degrading mRNA limit, 7 00 , equations (30) can be seen to converge to 

- 70 ^- 70 yd v -H(Kx)-H{Ky) H(Ky) H{Kx) 

1 bd T - 10 


1 


0 

II 

1 

0 

l 


0 


bd v 



To(x, y ) 


Tx(x,y) 


_ Ty(x, y) _ 


(33) 


The boundary conditions for the mean first exist times are determined by T% ( 27 , y&) = 0, for all 
locations ( 27 , y&) £ a f which the deterministic flow driven by flows out of the domain Q in state 
Z. Next, we specify a bounded domain Qq := {(*,y) : 0 < x < y,y < C}. The boundary conditions 
of equations (33) are then Tx(z,z) = 0 and Ty(z,C) = 0 Mz < C. We now use these boundary 
conditions, and integrate the second and the third components of the expression in equation (33). 
Subsequently we send C 00 and arrive at 

-1 = [-70*0* - 70 yd y - H(Kx) - H(Ky)} T 0 (x, y) 


py ———— p 00 ———— 

H(Ky) J e —f-T Q (x', y) dx ' + H(Kx) j ^-^T 0 (x, y') dx'. 


(34) 


V. WKB ANALYSIS 


A. WKB ansatz 


In order to find the qnasi-stationary distribution of the PDMP model and of the diffusion approx¬ 
imation of the GB and CB models, one uses the ansatz 

1 


Pstat = exp < - 


S 0 (x,y) + o( ] f 


(35) 


where e oc K 1 is the magnitude of the intrinsic noise in the protein dynamics. For the purposes of 
the WKB analysis the noise is assumed to be weak, i.e., e< 1. 

B. DA of the GB model 


In the context of the diffusion approximation of the GB model we use e = B/K. To leading order 
(O (K 0 )) one finds a Hamilton-Jacobi equation of the form 

0 = I (VSb) T D (VS 0 ) + v T • VS 0 . 


(36) 
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The vector v denotes the deterministic flow 

v(x,y) := 


§H(y)~ j 0 x 
# H(x)~ 70 y 


and the (scaled) diffusion matrix D is given by 

Dn{x,y) 0 


D (x,y) := 


0 D 2 2 (x, y) 


with entries 


‘IB -\-1 1 

Dn (x, y ) = — H(Ky) + — 70 a:, 

2iB 1 1 

D 22 (x, y ) = — H(Kx) + —Joy- 


(37) 


(38) 

(39a) 

(39b) 


C. DA of the CB model 


As before we use e = B / K . A similar leading-order calculation delivers the Hamilton-Jacobi 
equation, which is again of the form described in equation (36). The only differences are minor 
modifications in the diffusion matrix, which now has entries 


B 1 

£>11 (x, y ) = —H(Ky) + glox, 
B 1 

D 22 (x, y) = —H(Kx) + —70 y. 


(40a) 

(40b) 


D. DA of the NB model 


It is now convenient to use e = 1/A. Again one finds a Hamilton-Jacobi equation of the form as 
above. The diffusion matrix now has entries 


Dn (•'•• y) = —H(Ky) + j 0 x, 
D 2 2 (x, y) = —H(Kx) + joy- 


(41a) 

(41b) 


E. PDMP model 


For the PDMP model, similar calculations deliver the Hamilton-Jacobi equation 


0 = 


j3 

10 x - J7 H ( K 'y) 


9 x So + 


j3 

70 V - J7 H (Kx) 


+ 


B B 

J 0 X + 70 y ~ (Kx) - —H (Ky) 


dySo 

(d x So) ( dySo ) 


+ 7o^ (d x So) 2 + Joy (dySo) 2 + jox (d x So) 2 (d y So) + joy (d x So) (d y So) 2 ■ 


(42) 
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VI. NUMERICAL METHODS 

Sample paths of the individual-based processes (FM, CB, NB, and GB) are generated by the 
standard kinetic Monte Carlo algorithm [2, 3] implemented in C++. The PDMP process is simulated 
using the algorithm proposed by Bokes et al [4]. Simulations of the diffusion approximations are 
performed using the standard Euler-Maruyama algorithm with a constant time step St = 10 -4 . In all 
cases 10 6 sample paths are simulated for a sufficiently long time to measure stationary distributions. 
For the mean first switching times, we sample 105 initial states on a lattice on the domain 0 < 
iVx(O) < 7Vy(0) < 700. For each initial state, we simulate 10 4 sample paths, each until they cross 
the boundary Nx = Ny to measure the mean first switching times. 

The geometric minimum action method proposed by Heymann and Vanden-Eijnden [5] is imple¬ 
mented using MATLAB R2010a, and used to find the rate function So of the WKB method. For 
each model, we sample at least 150 end points and solve for the least-action paths, discretized into 
257 equidistant points, connecting one of the fixed points and the end point. The final landscapes 
are generated by linear interpolation of the rate functions so obtained. 

The finite-difference scheme to solve the adjoint equation was implemented in MATLAB R2010a, 
discretizing the domain 0 < x, y < C = 2000 into 150 x 150 grid points. The adjoint equation is then 
transformed to a set of 22500 linear equations, which is solved using a built-in numerical solver in 
MATLAB R2010a. 



Num. Protein Y Num. Protein X Num. mRNAY Num. mRNAX 
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VII. SAMPLE PATHS OF THE DIFFERENT MODELS 

A. Full model 




FIG. 1. One sample path of the full model (FM). Left panel: short time scale. Right panel: the protein 
expressions switches at a longer time scale driven by intrinsic noise. 
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B. Geometrically distributed burst model (GB) 



FIG. 2. One sample path of the model with geometrically distributed burst size (GB). Left panel: short time 
scale. Right panel: long time scale. 


C. Constant burst model (CB) 
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FIG. 3. One sample path of the model with constant bursts (CB). Left panel: short time scale. Right panel: 
long time scale. 
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D. No-burst model (NB) 



FIG. 4. One sample path of the model without bursts. Left panel: short time scale. Right panel: long time 
scale. In 1000 cell cycles, we observe no switching event in this sample path. 


E. PDMP model 





FIG. 5. One sample path of the PDMP. Left panel: short time scale. Right panel: long time scale. 
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F. Diffusion approximation of the GB model 
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FIG. 6. One sample path of the diffusion approximation of the GB model. Left panel: short time scale. Right 
panel: long time scale. 


G. Diffusion approximation of the CB model 



FIG. 7. One sample path of the diffusion approximation of the CB model. Left panel: short time scale. Right 
panel: long time scale. 
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H. Diffusion approximation of the NB model 




FIG. 8. One sample path of the diffusion approximation of the single-stage model without bursts. Left panel: 
short time scale. Right panel: long time scale. Similar to the NB model, no switching event occurs in 1000 
cell cycles in this sample path. 
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VIII. COMPARISON OF STATIONARY DISTRIBUTIONS 
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FIG. 9. Stationary distribution measured in simulations. All axes show 0 < Nx,Ny < 700 on a linear scale. 
Insets show the distribution as viewed from the point Nx = Ny = 700 facing towards the origin, (a) Full 
model; (b) PDMP; (c) GB model; (d) Diffusion approximation (DA) of GB; (e) CB model; (f) DA of CB; (g) 
NB model; (a) DA of NB. The same colour scale is used in all panels, except for panels e and f. 
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IX. COMPARISON OF THE MEAN FIRST SWITCHING TIMES 
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FIG. 10. Mean first switching times. All graphs show 0 < Nx,Ny < 700 on linear scales, (a) Full model; (b) 
PDMP; (c) GB model; (d) Diffusion approximation (DA) of GB; (e) CB model; (f) Numerical solution of the 
adjoint equation of the PDMP. Data are plotted on the same colour scale in all panels to allow comparison. 
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X. COMPARING WKB RESULTS 



FIG. 11. Results from the WKB analysis. Panels a, c, e, and g show the WKB rate functions So(Nx,Ny), 
and panels b, d, f, and h the corresponding approximation for the stationary probability distribution 
J\f exp [—So (Ax, Ay) /e\ where J\f is the normalisation factor. All panels show 0 < Nx,Ny < 700 on a 
linear scale. The insets show the stationary distributions viewed from (Nx,Ny) = (700,700). (a, b) PDMP; 
(c, d) DA of GB ; (e, f) DA of CB; (g, h) DA of NB. 
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